Automatic Face Recognition System Based on Local Fourier-Bessel Features 
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Abstract 

We present an automatic face verification system in- 
spired by known properties of biological systems. In the 
proposed algorithm the whole image is converted from 
the spatial to polar frequency domain by a Fourier-Bessel 
Transform (FBT). Using the whole image is compared to 
the case where only face image regions (local analysis) are 
considered. The resulting representations are embedded in 
a dissimilarity space, where each image is represented by its 
distance to all the other images, and a Pseudo-Fisher dis- 
criminator is built. Verification test results on the FERET 
database showed that the local-based algorithm outper- 
forms the global-FBT version. The local-FBT algorithm 
performed as state-of-the-art methods under different test- 
ing conditions, indicating that the proposed system is highly 
robust for expression, age, and illumination variations. We 
also evaluated the performance of the proposed system un- 
der strong occlusion conditions and found that it is highly 
robust for up to 50% of face occlusion. Finally, we auto- 
mated completely the verification system by implementing 
face and eye detection algorithms. Under this condition, 
the local approach was only slightly superior to the global 
approach. 



1 Introduction 

Face verification and recognition tasks are highly com- 
plex due to the many possible variations of the same sub- 
ject in different conditions, like illumination, facial expres- 
sion, and age. Many developers of face recognition algo- 
rithms adopted a biologically inspired approach in solving 
these problems (for a review, see 1 2 1), thus contributing both 
to understand human face processing and to build efficient 
face recognition technologies. 

The approach described in the present paper was inspired 
by developments in neurophysiology and cognitive psy- 
chology and its fundamentals were first described by 1 3 1 1. 



It is based on image representation that may be analogous to 
those used by the human visual system (HVS). In particular, 
we evaluated the performance of a face verification system 
whose primary features were the magnitude of radial and 
angular components of faces images, and representation in 
a dissimilarity space. The main contribution of this paper 
is the proposal to use local analysis approach, in contrast 
to the previously used global analysis approach 1 . We show 
that a system based on the new method achieves state-of- 
the-art performance level. Moreover, we demonstrate that 
the proposed system is robust to typical variations in face 
images, like facial expression, age, illumination, and partial 
occlusion. 

The paper is organized as follows: in the next section, 
we briefly introduce the reader to the primary spatial pro- 
cessing by the HVS and to the related literature. We de- 
scribe in section 3 the Fourier-Bessel Transform (FBT) and 
the proposed algorithm in section 4. We introduce the face 
database and testing methods in Section 5. The experimen- 
tal results are presented in Section 6 and in the last section 
we discuss the results and ongoing work. 

2 Background and previous work 

Most of the current face recognition and verification al- 
gorithms are based on feature extraction from a Cartesian 
perspective, typical to most analog and digital imaging sys- 
tems. On the other hand, the HVS is known to process vi- 
sual stimuli by fundamental shapes defined in polar coor- 
dinates. In the early stages the visual image is filtered by 
neurons tuned to specific spatial frequencies and location in 
a linear manner [4|. In further stages, these neurons output 
is processed to extract global and more complex shape in- 
formation, such as faces |19|. Electrophysiological exper- 
iments in monkey's visual cerebral areas showed that the 
fundamental patterns for global shape analysis are defined 



1 Partial results based on a preliminary version of the system were sub- 
mitted in 



in polar and hyperbolic coordinates 1 1 1 1. Global pooling of 
orientation information was also shown by psychophysical 
experiments to be responsible for the detection of angular 
and radial Glass dot patterns [27 1. Thus, it is evident that 
information regarding the global polar content of images is 
effectively extracted by and is available to the HVS. In 1311 
we introduced the representation of face images in the polar 
frequency domain by global two-dimensional FBT features. 
However, one of the disadvantages of global feature extrac- 
tions is the rough representation of peripheral regions. The 
HVS compensates this effect by eye saccades, moving the 
fovea from one point to the other in the scene. Here we 
propose to apply the FBT at strategic regions 1131 . namely 
the eyes region. Moreover, we also integrated face and eyes 
detection algorithms, which makes the verification system 
completely automatic. 

3 Fourier-Bessel analysis 

The FB series (9) is useful to describe the radial and 
angular components in images. FBT analysis starts by 
converting the coordinates of a region of interest from 
Cartesian (x,y) to polar (r, 9). The f(r,9) function is 
represented by the two-dimensional FB series, defined as 

oo oo 
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where J n is the Bessel function of order n, f(R, 6) = 
and < r < R. a n> i is the ith root of the J n function, 
i.e. the zero crossing value satisfying J n (a n ,i) = 0. R is 
the radial distance to the edge of the image. The orthogonal 
coefficients A n> i and B n ^ are given by 

8=2n r=R 
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4 Face verification using FBT 

The proposed algorithm is based on image registration 
and normalization, and two subsequent feature extraction 
steps followed by a classifier formation. After the first two 
steps, we extract the FB coefficients from the images, we 
compute the pairwise Cartesian distance between all the 
FBT-representations and re-define each object by its dis- 
tance to all other objects. In the last stage we train a pseudo 
Fisher classifier. We tested this algorithm on the whole im- 
age (global analysis) and on the combination of three facial 
regions (local analysis). 

4.1 Face registration and normalization 

Face representations requires prior image registration 
and usually a spatial and luminance normalization pre- 
processing. Assuming the sample images contain a single 
face, we detected the head with a cascade of classifiers 1 26 1 
and estimated the location of the eyes region with an Active 
Appearance Model algorithm |3). Within this region, we 
used flow field information 1 14 1 to determine the eyes cen- 
ter. Using the eyes coordinates, we translated, rotated, and 
scaled the images so that the eyes were registered at specific 
pixels. Next, the images were cropped to 130 x 150 pixels 
size and a mask was applied to remove most of the hair and 
background. The unmasked region was histogram equal- 
ized and normalized to zero mean and a unitary standard 
deviation. 

4.2 Spatial to polar frequency domain 

Images were transformed by a FBT up to the 30 th Bessel 
order and 6 th root with angular resolution of 3 ° , thus ob- 
taining 372 coefficients. These coefficients correspond to 
a frequency range of up to 30 and 3 cycles/image of angu- 
lar and radial frequency, respectively. This frequency range 
was selected based on earlier tests 1 3 1 1 with the small-size 
ORL face database |23|. We tested the FBT descriptors of 
the whole image, as well as a combination of the upper right 
region, upper middle region, and the upper left region (Fig- 
ure 

4.3 Polar frequency to dissimilarity domain 
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We built a symmetric dissimilarity matrix D (t,t) de- 
fined as the Euclidean distance between all training FBT 
images t. In this space, each object is represented by 
its dissimilarity to all objects. This approach is based on 
the assumption that the dissimilarities of similar objects to 
"other ones" is about the same |6|. Among other advan- 
tages of this representation space, by fixing the number of 
features to the number of objects, it avoids a well known 
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Figure 1. Sample of a normalized whole face 
image and the regions that were used for the 
local analysis 



phenomenon, where recognition performance is degraded 
as a consequence of small number of training samples as 
compared to the number of features. 

4.4 Classifier 

Test images were classified based on a pseudo FLD using 
a two-class approach. A FLD is obtained by maximizing 
the (between subjects variation)/(within subjects variation) 
ratio 1 13. Here we used a minimum-square error classifier 
implementation \24\, which is equivalent to the FLD for 
two-class problems |10|. In these cases, after shifting the 
data such that it has zero mean, the FLD can be defined as 



D(t,x) - - (mi - m 2 ) 



S 1 (mi - m 2 ) 



(4) 

where x is a probe image, S is the pooled covariance matrix, 
and nij stands for the mean of class i. The probe image x 
is classified as corresponding to class- 1 if g(x) > and to 
class-2 otherwise. However, as the number of training ob- 
jects and dimensions is the same in the dissimilarity space, 
the sample estimation of the covariance matrix S becomes 
singular, and the classifier cannot be built. One solution to 
the problem is to use a pseudo-inverse and augmented vec- 
tors |24|. Thus, Eq. 6 is replaced by 



<?(x) = (£> (t,x),l) (D(t,t),7) 



(-1) 



(5) 



where (D (t,x) , 1) is the augmented vector to be classi- 
fied and (D(t,t) ,1) is the augmented training set. The 
inverse (D (t, t) , I) is the Moore-Penrose Pseudo- 
inverse which gives the minimum norm solution. The cur- 
rent L-classes problem can be reduced and solved by the 
two-classes solution described above. The training set was 
split into L pairs of subsets, each pair consisting of one sub- 
set with images from a single subject and a second subset 
formed from all the other images. A pseudo-FLD was built 
for each pair of subsets. A probe image was tested on all L 
discriminant functions, and a "posterior probability" score 
was generated based on the inverse of the Euclidean dis- 
tance to each subject. 



5 Database, preprocessing, and testing pro- 
cedures 

We used the FERET database, due to its large number 
of individuals and rigid testing protocols that allow pre- 
cise performance comparisons between different algorithms 
1201 . Here we compare our algorithm performance with a 
"baseline" algorithm and with the published results of three 
successful approaches 1221 . As a baseline algorithm we im- 
plemented a standard PCA-based algorithm |25 1. The prin- 
cipal components were based on a set of 700 images se- 
lected randomly from the gallery subset. Not all 1196 im- 
ages were used, due to the huge RAM memory that such 
operation requires. The first three principal components, 
that encode basically illumination variations 1121 . were ex- 
cluded before projecting of the training and test images. 
The three other approaches are: Gabor wavelets combined 
with elastic bunch graph matching (EBGM) |28|, localized 
facial features extraction followed by a Linear Discriminant 
Analysis (LDA) |8|, and a Bayesian generalization of the 
LDA method fTSl. 

In the FERET protocol, a gallery set of one frontal view 
image from 1 196 subjects is used to train the algorithm and 
a different dataset is used as probe. All images are gray- 
scale 256 x 384 pixels size. We used the four probe sets, 
termed FB, Dupl, DupII and FC. The FB dataset is con- 
stituted of a single image from 1195 subjects, taken from 
the same subjects in the gallery set, after an interval of a 
few seconds, but with a different facial expression. The 
Dupl and DupII datasets include 722 or 234 images, respec- 
tively. The Dupl images were taken immediately or up to 34 
months after the gallery images, while the images in DupII 
were taken at least 18 months after the gallery images. The 
FC subset contains 194 images of subjects under different 
lighting conditions. 

The eyes coordinates were extracted automatically, as 
described in Section 4.1. Approximately 1% of the faces 
were not localized, in which cases the eyes region coor- 
dinates were set to a fix value derived from the mean of 
the located faces. The final mean error was 3.6 ±5.1 
pixels. In order to estimate the system performance un- 
der minimal localization errors, we executed a second se- 
ries of experiments in which ground-truth information was 
used. The face registration was followed by a normalization 
step, as described in Section 4. 1 . The same pre-processing 
procedure was used in previous algorithms, except for the 
Gabor-EBGM system where a special normalization proce- 
dure was used. 

The performance of the system was evaluated by verifi- 
cation tests according to the FERET protocol 1201 . Given 
a gallery image g and a probe image p, the algorithm ver- 
ifies the claim that both were taken from the same subject. 
The verification probability Py is the probability of the al- 
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Figure 2. ROC functions of the FBT, PCA, and 
illumination subsets. 



gorithm accepting the claim when it is true, and the false- 
alarm rate Pp is the probability of incorrectly accepting a 
false claim. The algorithm decision depends on the poste- 
rior probability score si (k) given to each match, and on a 
threshold c. Thus, a claim is confirmed if si (k) < c and re- 
jected otherwise. A plot of all the combinations of Py and 
Pp as a function of c is known as a receiver operating char- 
acteristic (ROC). Py and Pp were calculated as the number 
of confirmations divided by the number of correct or incor- 
rect matches, respectively. This procedure was repeated for 
100 equally spaced threshold levels. Training and tests were 
done with the PRTools toolbox 1 5 1 . 

6 Results 

6.1 Semi-automatic system 

Figure [2] shows the performance of the proposed algo- 
rithm in the verification test. On the expression dataset the 
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global and local FBT versions performed at about the same 
level as the previous best and second-best algorithms, re- 
spectively. On both age datasets the FBT algorithms outper- 
formed the previous algorithms, with the local version being 
slightly superior. On the illumination dataset the global and 
local FBT algorithms were equal or better than the second- 
best previous algorithm (PCA+LDA). 

6.2 Partial occlusion 

Local approaches for face recognition are in general 
more robust for occlusions (for e.g. 1 1 51 1171 1 than global 
ones. To evaluate this aspect of the proposed algorithm, we 
occluded all the normalized test images with a gray mask 
that covered >50% of the total area. We tested two mask- 
ing options: masking of the right-eye and mouth regions or 
masking of the mouth and nose regions (Fig. fj. Fi gure 
[4] shows the effect of occlusion on the performance of the 
global and local versions of the FBT algorithms. The se- 



vere occlusions did not reduce much the performance of the 
local algorithm on the expression and age subsets, but af- 
fected significantly performance on the illumination subset. 
The global version performed much worse under occlusion 
conditions on all subsets. These results confirm the advan- 
tage of the local over the global approach, and demonstrate 
the high robustness of the local-FBT under strong occlusion 
conditions combined with expression and age variations. 
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Figure 3. Examples of image 



6.3 Automatic system 

Figure |3 shows the performance of the FBT algorithms 
with ground-truth information and when the eyes were de- 
tected automatically. The localization errors introduced in 
the latter case reduced the performance of the FBT algo- 
rithms up to 20%, approximately as it affected the PCA al- 
gorithm, which is known to be sensitive to this type of error 
1161 . The localization sensitivity of the proposed system 
is expected, considering the variance property of the FBT 
to translation | 0- It is interesting to notice, however, that 
under such conditions the advantage of the local over the 
global approach was significantly reduced. 

7 Discussion 

We introduced a fully automated biologically-motivated 
local-based system for face verification tasks. The main em- 
pirical result of this study is the demonstration of the high 
performance of a verification system based on FBT descrip- 
tors, especially when these are extracted locally. The sig- 
nificant advantage of the FBT approach in the age variance 
tests are an indication of the robustness of the polar features 
in realistic situations of face variations that exceeds simple 
facial expression, like illumination and age. The superior 
behavior of the local approach was especially strong w.r.t. 
robustness to occlusion. In the local version, the mouth re- 
gion is completely ignored, thus its occlusion or variation 
(ex. due to a new beard or a scarf) does not affect perfor- 
mance at all. Moreover, the local-FBT outperformed the 
global-FBT even when the occluded regions included face 
regions that were analyzed by the local version. 



The property of robustness to occlusion of local analysis 
was explored by others. Local PCA was used in 1151 1171 
to detect occluded regions in face images. The test images 
were classified by comparing the unoccluded regions to cor- 
responding regions in training images. However, the com- 
bination of FBT features and local approach has several ad- 
vantages over that method besides performance. In our pro- 
posal there, is no need to detect the occluded regions in the 
test images. Furthermore, there is no need for special train- 
ing strategies 1171 or for training of specific classifiers for 
each testing image depending on the occluded region 1 15 1. 
Finally, there is no need for any classification rule for the 
combination of the local features; the FBT features form a 
single vector. It is hard to compare our performance results 
with those obtained by 1151 1171 . since their tests were per- 
formed on subsets of less than 100 images. The training and 
test images also did not included variations of expression, 
illumination or age. The algorithm of 1171 was adapted in 
order to deal with expression variation by weighting differ- 
ently local areas and assuming that the facial expression of 
the training images is known. In contrast, here we show that 
the proposed system can deal simultaneously with expres- 
sion, illumination, and age variations, besides large scale 
occlusions. 

Performance gain of the automatic FBT method can be 
achieved by improving the eyes localization algorithm. For 
example, 1 17 1 learned the subspace that represents localiza- 
tion errors within eigenfaces. This method can be adopted 
for the FBT subspace, with the advantage of the option to 
exclude from the final classification face regions that gives 
high localization errors. 

The relation of the present algorithm to human face 
recognition was not directly evaluated here, but a few as- 
sociations can be done. As discussed in the Introduction, 
there is clear evidence that the HVS extract global radial and 
angular shape information, a fact that might look incompat- 
ible with the informative advantage of the local information 
pooling showed here. However, only little is known about 
the size of the global pooling area. A 1.2 visual degrees 
pooling area was suggested for the detection of Glass pat- 
terns [27 1, but the spatial locations and scale regarding face 
images remain as open questions. 

In the proposed system, the classifier operates in non- 
domain-specific metric space whose coordinates are simi- 
larity relations. The high performance achieved by this rep- 
resentation indicates that the "real-world" proximity rela- 
tions between face images are preserved to a good extent 
in the constructed internal space. It is possible that humans 
also use an analogous space to represent visual objects. This 
hypothesis was studied by correlating the distance between 
different shape objects by objective and perceptual param- 
eters (see [7 1 for a review). Comparison of the two mea- 
surements is usually done by a multidimensional scaling 
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Figure 4. ROC functions of the FBT on the occluded and not occluded age, expression and illumina- 
tion subsets. 



analysis (MDS), which projects objects as points in a two- 
dimensional space where the distance between the points 
approximate the Euclidean distance between the original 
objects. For example, in one study |21 1 objective and per- 
ceptual sorting of face images were highly correlated, espe- 
cially when the objective sorting used global features, such 
as age and weight of the persons in the images. Similar 
results were obtained in a neurophysiological study 1 29 1 in 
which monkeys were presented with face images. It was 
found that the MDS maps obtained from the original images 
and from the response patterns of neurons in the inferotem- 
poral cortex had similar patterns. These results indicate that 
representing images in a dissimilarity space can be analo- 
gous to human representation mechanisms. 

In conclusion, the proposed system combines high face 
verification performance for expression, age, and illumina- 
tion tests, and robustness to occlusion. Future investiga- 
tions, using psychophysical methods, should establish the 
level of its relation to biological systems. 
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